On Consistent Checkpointing in Distributed Systems

نویسندگان

Guohong Cao

Mukesh Singhal

چکیده

Consistent checkpointing simpliies failure recovery and eliminates the domino eeect in case of failure by preserving a consistent global checkpoint on the stable storage. However, the approach suuers from high overhead associated with the checkpointing process. Two approaches are used to reduce the overhead: one is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. These two approaches were orthogonal in previous years until the Prakash-Singhal algorithm 17] combined them. In other words, the Prakash-Singhal algorithm forces only a minimum number of processes to take checkpoints, and it does not block the underlying computation. In this paper, we identify two problems in their algorithm 17] and prove that there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. Based on the proof, we present an eecient algorithm that neither forces all processes to take checkpoints, nor blocks the underlying computation during checkpointing. Correctness proofs are also provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Necessary and sufficient conditions for transaction-consistent global checkpoints in a distributed database system

Checkpointing and rollback recovery are well-known techniques for handling failures in distributed systems. The issues related to the design and implementation of efficient checkpointing and recovery techniques for distributed systems have been thoroughly understood. For example, the necessary and sufficient conditions for a set of checkpoints to be part of a consistent global checkpoint has be...

متن کامل

Transaction-Consistent Global Checkpoints in a Distributed Database System

Checkpointing and rollback recovery are well-known techniques for handling failures in distributed database systems. In this paper, we establish the necessary and sufficient conditions for the checkpoints on a set of data items to be part of a transaction-consistent global checkpoint of the distributed database. This can throw light on designing efficient, non-intrusive checkpointing techniques...

متن کامل

Review of Some Checkpointing Schemes for Distributed and Mobile Computing Environments

Mr Raman Kumar Mewar University, Chittorgargh (Raj) Email: [email protected] Dr Parveen Kumar Amity University Gurgaon (Haryana) Email: [email protected] ---------------------------------------------------------------------ABSTRACT------------------------------------------------------Fault Tolerance Techniques facilitate systems to carry out tasks in the incidence of faults. A checkpoint is a...

متن کامل

An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems

This paper presents an index based checkpointing algorithm for distributed systems with the aim of reducing the total number of checkpoints while ensuring that each checkpoint belongs to at least one consistent global checkpoint or recovery line The algorithm is based on an equivalence relation de ned between pairs of successive checkpoints of a process which allows in some cases to advance the...

متن کامل

The Performance of Consistent Checkpointing in Distributed Shared Memory Systems

This paper presents the design and implementation of a consistent checkpointing scheme for Distributed Shared Memory (dsm) systems. Our approach relies on the integration of checkpoints within synchronization barriers already existing in applications; this avoids the need to introduce an additional synchronization mechanism. The main advantage of our checkpoint-ing mechanism is that performance...

متن کامل

An optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems

Checkpointing and rollback recovery are widely used techniques for achieving fault-tolerance in distributed systems. In this paper, we present a novel checkpointing algorithm which has the following desirable features: A process can independently initiate consistent global checkpointing by saving its current state, called a tentative checkpoint. Other processes come to know about a consistent g...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

On Consistent Checkpointing in Distributed Systems

نویسندگان

چکیده

منابع مشابه

Necessary and sufficient conditions for transaction-consistent global checkpoints in a distributed database system

Transaction-Consistent Global Checkpoints in a Distributed Database System

Review of Some Checkpointing Schemes for Distributed and Mobile Computing Environments

An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems

The Performance of Consistent Checkpointing in Distributed Shared Memory Systems

An optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems

عنوان ژورنال:

اشتراک گذاری